[perl] Efficient processing of large text
Posted
by jesper
on Stack Overflow
See other posts from Stack Overflow
or by jesper
Published on 2010-04-06T20:54:07Z
Indexed on
2010/04/06
21:13 UTC
Read the original article
Hit count: 188
I have text file that contains over one million urls. I have to process this file in order to assign urls to groups, based on host address:
{ 'http://www.ex1.com' => ['http://www.ex1.com/...', 'http://www.ex1.com/...', ...], 'http://www.ex2.com' => ['http://www.ex2.com/...', 'http://www.ex2.com/...', ...] }
My current basic solution takes about 600mb of RAM to do this (size of file is about 300mb). Could You provide some more efficient ways? My current solution simply reads line by line, extracts host address by regex and put url into hash.
© Stack Overflow or respective owner